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^ . Abstract 



We present an improved algorithm for solving symmetrically diagonally dominant linear systems. On 
input of an n X n symmetric diagonally dominant matrix A with m non-zero entries and a vector b such 
that Ax = b for some (unknown) vector x, our algorithm computes a vector x such that | [x— xj \a < £\\x\\a 
^ in time 

C/^ '• d(mlognlog(l/e)).2 

Q . 

• ^ The solver utilizes in a standard way a 'preconditioning' chain of progressively sparser graphs. To 

^ ■ claim the faster running time we make a two-fold improvement in the algorithm for constructing the 

chain. The new chain exploits previously unknown properties of the graph sparsification algorithm given 
in [Koutis, Miller, Peng, FOCS 2010], allowing for stronger preconditioning properties. We also present 
^ I an algorithm of independent interest that constructs nearly-tight low-stretch spanning trees in time 

^SJ ■ 0(m log n), a factor of 0(logri,) faster than the algorithm in [Abraham, Bartal,Neiman, FOCS 2008]. 

' This speedup directly reflects on the construction time of the preconditioning chain. 

00 ■ 

^ ■ 1 Introduction 

I Solvers for symmetric diagonally dominant (SDD)^ systems are a crucial component of the fastest known 
algorithms for a multitude of problems that include (i) Computing the hrst non-trivial (Fiedler) eigen- 
vector of the graph, with well known applications to the sparsest-cut problem [Fie73, ST96, Chu97]; (ii) 
Generating spectral sparsifiers that also act as cut-preserving sparsifiers [SS08]; (iii) Solving linear systems 
^ ' derived from elliptic finite element discretizations of a significant class of partial differential equations 
■ [BHV04]; (iv) Generalized lossy flow problems [SD08]; (v) Generating random spanning trees [KM09]; (vi) 
- - - Faster maximum flow algorithms [CKM+11]; and (vii) Several optimization problems in computer vision 
[KMSTODb, KMTll] and graphics [MP08, JMD+07]. 

These algorithmic advances were largely motivated by the seminal work of Spielman and Teng who 
gave the flrst nearly-linear time solver for SDD systems [ST04, EEST05, ST06]. The running time of their 
solver is a large number of polylogarithmic factors away from the obvious linear time lower bound. In 
recent work, building upon further work of Spielman and Srivastava [SS08], we presented a simpler and 
faster SDD solver with a run time of 0(m log^ n log e~^), where m is the number of nonzero entries, n is 
the number of variables, and e is a standard measure of the approximation error [KMPlOa]. 

It has been conjectured that the algorithm of [KMPlOa] is not optimal [SpilOb, TenlO, SpilOa]. In this 
paper we give an affirmative answer by presenting a solver that runs in 0(m log n log e~^) time. 



'Partially supported by the National Science Foundation under grant number CCF-1018463. 
^11 ■ lU denotes the A- norm 



■^The O notation hides a (log log n)^ factor 
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A system Ax = b is SDD when A is symmetric and An > X^j^i 
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The O(logn) speedup of the SDD solver apphes to all algorithms listed above, and we believe that 
it will prove to be quite important in practice, as applications of SDD solvers frequently involve massive 
graphs [TcnlO]. 

1.1 Overview of our techniques 

The key to all known near-linear work SDD solvers is spectral graph sparsification, which on a given input 
graph G constructs a sparser graph H such that G and H are 'spectrally similar' in the condition number 
sense, defined in Section 2. Spectral graph sparsification can be seen as a significant strengthening of the 
notion of cut-preserving sparsification [BK96] . 

The new solver follows the framework of recursive preconditioned Chebyshev iterations [ST06, KMPlOa]. 
The iterations are driven by a so-called preconditioning chain {Gi,Hi, G2, H2, • • • , } of graphs, where Hi 
is a spectral sparsifier for Gi and G^+i is generated by contracting Hi via a greedy elimination of degree 1 
and 2 nodes. The total work of the solver includes the time for constructing the chain, and the work spent 
on actual iterations which is a function on the preconditioning quality of the chain. The preconditioning 
quality of the chain in turn depends on the guarantees of the sparsification algorithm. 

More concretely, all sparsification routines that have been used in SDD solvers conform to the same 
template; on input a graph G with n vertices and m edges returns a graph H with n + 0{m\og'^ n) / k edges 
such that the condition number of the Laplacians of G and H is k. In all known SDD solvers the factor 
O(log'^n) appears directly in the running time of the SDD solver. In particular the solver of [KMPlOb] 
was based on a sparsification routine for which c = 2. 

The optimism that SDD systems can be solved in time 0(m log n log e~^) has mainly been based on 
the result of Kolla et al. [KMST09a] who proved that there is a polynomial (but far from nearly-linear) 
time algorithm that returns a sparsifier with c = 1. However, our new solver is instead based on a slight 
modification and a deeper analysis of the sparsification algorithm in [KMPlOa] which enables a subtler 
chain construction. 

The incremental sparsification algorithm in [KMPlOa] computes and keeps in H a properly scaled copy 
of a low-stretch spanning tree of G, and adds to H a, number of off-tree samples from G. The key enabling 
observation in the new analysis is that the total stretch of the off-tree edges is essentially invariant under 
sparsification. In other words, the total stretch of the off-tree edges in Hi is at most equal to that Gi. 
The total stretch is invariable under the graph contraction process as well. The elimination process that 
generates Gj+i from Hi naturally generates a spanning tree for Gj+i. The total stretch of the off-tree edges 
in Gi^i is at most equal to that in Hi. This effectively allows us to compute only one low-stretch spanning 
tree for the first graph in the chain, and keep the same tree for the rest of the chain. This is a significant 
departure from previous constructions, where a low-stretch spanning tree had to be calculated for each Gi. 

The ability to keep the same low-stretch spanning tree for the whole chain, allows us to prove that 
Laplacians of spine-heavy graphs, i.e. graphs with a spanning tree with average stretch 0(1/ log n), can 
be solved in linear time. This average stretch is a factor of 0(log^ n) smaller than what is true for general 
graphs. We reduce the first general graph Gi into a spine- heavy graph G2 by scaling- up the edges of its 
low-stretch spanning tree by a factor of O(log^n). This results in the construction of a preconditioner 
chain with a skewed set of conditioner numbers. That is, the condition number of the pair (^Gij Hi^ is a 
fixed constant with the exception of {Gi, Hi) for which it is 0(log^ n). In all previous solvers the condition 
number for the pair (Gi,Hi) was a uniform function of the size of Gi. 

An additional significant departure from previous constructions is in the way that the number of edges 
decreases between subsequent Gj's in the chain. For example, in the [KMPlOa] chain the number of edges 
in Gj+i is always at least a factor of O(log^n) smaller than the number of edges in Gi. In the chain 
presented in this paper irregular decreases are possible; for example a big drop in the number of edges may 
occur between G2 and G3 and the progress may stagnate for a while after G3, until it starts again. 

In order to analyze this new chain we view the graphs Hi as multi-graphs or graphs of samples. In 



2 



the sampling procedure that generates Hi, some ofF-tree edges of Gi can be sampled multiple times, and 
so Hi is naturally a multi-graph, where the weight of a 'traditional' edge e is split among a number of 
parallel multi-edges with the same endpoints. The progress of the overall sparsification in the chain is then 
monitored in terms of the number of multi-edges in the iJj's. In other words, when the algorithm appears 
to be stagnated in terms of the edge count in the Gj's, progress is still happening by 'thinning' the off-tree 
edges. The details are given in Section 4. 

The final bottleneck to getting an 0(m log n) algorithm for very sparse systems is the 0(m log n -|- 
nlog^n) running time of the algorithm for constructing a low-stretch spanning tree [ABN08, EEST05]. 
We address the problem by noting that it suffices to find a low-stretch spanning tree on a graph with 
edge weights that are roughly powers of 2. In this special setting, the shortest path like ball/cone growing 
routines in [ABN08, EEST05] can be sped up in a way similar to the technique used in [OMSWIO]. We 
also slightly improve the result of [OMSWIO], which may be of independent interest. 

2 Background and notation 

A matrix A is symmetric diagonally dominant if it is symmetric and An > ^j^i l^ijl- It is well understood 
that any linear system whose matrix is SDD is easily reducible to a system whose matrix is the Laplacian 
of a weighted graph with positive weights [Grc96]. The Laplacian matrix of a graph G = {V,E,w) is the 
matrix defined as 

Lcihj) = -Wi,j and LG{i,i) = '^mj- 

There is a one-to-one correspondence between graphs and Laplacians which allows us to extend some 
algebraic operations to graphs. Concretely, if G and H are graphs, we will denote hy G + H the graph 
whose Laplacian is Lq + Lh, and by cG the graph whose Laplacian is cLq- 

Definition 2.1 [Spectral ordering of graphs] 

We define a partial ordering ^ of graphs by letting 

G ^ H \i and only if Lqx < x'^Lhx, 

for all real vectors x. • 

If there is a constant c such that G ^ cH ^ kG, we say that the condition of the pair (G, H) is k. In 
our proofs we will find useful to view a graph G = {V, E, w) as a graph with multiple edges. 

Definition 2.2 [Graph of samples] 

A graph G = {V, E, w) is called a graph of samples, when each edge e of weight We is considered as a sum 
of a set Ce of parallel edges, each of weight wi = We/\Ce\- When needed we will emphasize the fact that a 
graph is viewed as having parallel edges, by using the notation G = {V,C,w). • 

Definition 2.3 [Stretch of edge by tree] 

Let T = {V, Ex, w) be a tree. For e G Et let w'^ = l/Wf,- Let e be an edge not necessarily in Enp, of weight 
We- If the unique path connecting the endpoints of e in T consists of edges ei . . . e^, the stretch of e by T 
is defined to be 

stretchT(e) = — — —. • 

A key to our results is viewing graphs as resistive electrical networks [DSOO]. More concretely, if 
G = {V,C,w) each I S £ corresponds to a resistor of capacity l/wi connecting the two endpoints of C. We 
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denote by Rg{^) the effective resistance between the endpoints of e in G. The effective resistance on 
trees is easy to calculate; we have Rrie) = Yli=i ^/w{ei). Thus 

stretchT{e) = WeRrie)- 

We extend the definition to / G £e in the natural way 

stretchxil) = wiRxie), 

and note that stretchT{e) = Xlier^ 'Streic/iT(0- 

This definition can also be extended to set of edges. Thus stretchT{E) denotes the vector of stretch 
values of all edges in E. We also let stretchxiG) denote the vector of stretch for edges in Eg — Et- 

Definition 2.4 [Total OfT-Tree Stretch] 

Let G = {V,Eg,w) be a graph, T = {V,Et,w) be a spanning tree of G. We define 

\stretchT{G)\ = stretchT{e). • 

c^Eq-Et 

3 Incremental Sparsifier 

In their remarkable work [SS08], Spielman and Srivastava analyzed a spectral sparsification algorithm based 
on a simple sampling procedure. The sampling probabilities were proportional to the effective resistances 
Rg{&) of the edges on the input graph G. Our solver in [KMPlOa] was based on an incremental sparsifi- 
cation algorithm which used upper bounds on the effective resistances, that are more easily calculated. In 
this section we give a more careful analysis of the incremental sparsifier algorithm given in [KMPlOa]. 

We start by reviewing the basic Sample procedure. The procedure takes as input a weighted graph 
G and frequencies Pg for each edge e. These frequencies are normalized to probabilities pe summing to 1. 
It then picks in q rounds exactly q samples which are weighted copies of the edges. The probability that 
given edge e is picked in a given round is pe- The weight of the corresponding sample is set so that the 
expected weight of the edge e after sampling is equal to its actual weight in the input graph. The details 
are given in the following pseudocode. 



Sample 

Input: Graph G = {V,E,w), p' : E , real ^. 

Output: Graph G' = {V,C,w'). 

q := logtlog(l/^) (* Cs is an explicitly known constant *) 
Pe ■■= Pe/t 

G' := iV,C,w') with £ = 
for q times do 

Sample one e £ E with probability of picking e being pe 

Add sample of e, / to Ce with weight w[ ~ We/{peq) {* Recall that C — Uege 
end for 
return G' 



The following Theorem characterizes the quality of G' as a spectral sparsifier for G and it was proved 
in [KMPlOa]. 
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Theorem 3.1 (Over sampling) Let G = {V,E,w) be a graph. Assuming that p'^ > WeRci^) /''^ each 
edge e £ E, and ^ € the graph G' = S AMPLE {G , p' , satisfies 

G<2G' < 3G 

with probability at least 1 — ^. 

Suppose we are given a spanning tree T of G = {y,E,w). The incremental sparsification algorithm 
of [KMPlOa] was based on two key observations: (a) By Rayleigh's monotonicity law [DSOO] we have 
-Rr(e) > Rg{^) because T is a subgraph of G. Hence the numbers stretchrie) satisfy the condition of 
Theorem 3.1 and they can be used in Sample, (b) Scaling up the edges of T in G by a factor of k gives a 
new graph G' where the stretches of the off-tree are smaller by a factor of k relative to those in G. This 
forces Sample (when applied on G') to sample more often edges from T, and return a graph with a smaller 
number of off-tree edges. In other words, the scale-up factor k, allows us to control the number of off-tree 
edges. Of course this comes at the cost of incurring condition k between G and G' . 

In this paper we follow the same approach, but also modify IncrementalSparsify so that the output 
graph is a union of a copy of T and the off-tree samples picked by Sample. To emphasize this, we will 
denote the edge set of the output graph by Et U C. The details are given in the following algorithm. 



IncrementalSparsify 

Input: Graph G = {V, E, w), edge-set Et of spanning tree T, reals k;>1,0<^<1 
Output: Graph H = (V, Et U £) or FAIL 

1: Calculate stretchT{G) 

2: if \stretchT{G)\ < 1 then 
3: return 2T 

4: end if 

5: T' := kT. 

6: G' := G + {k - 1)T (* G' is the graph obtained from G by replacing T by T' *) 

7: i := \stretchT'{G')\ {* i= \stretchT{G)\/K *) 

B,: t = t -\- n — \ (* total stretch including tree edges *) 

9: H = {VX) := Sample(G', stretchT'iE'), 

10: if (Se^^T l^^el) - 2(iA)Cs logtlog(l/^) (* Gs is the constant in Sample *) 
11: return FAIL 
12: end 

13: C:=C-[j^^Er^e. 

14: H:=C + 3T' 
15: return 4H 



Theorem 3.2 Let G be a graph with n vertices and m edges and T be a spanning tree of G. Then 
for ^ G 0,{l/n), IncrementalSparsify(G, -Et, k, ^) computes with probability at least 1 — 2^ a graph 
H = {V, Et U C) such that 

» G ^ MuG 

• |£| <2£C5logtlog(l/0 

where t = stretchT{G)/ k, t = t + n — 1, and Cs is the constant in Sample. The algorithm can be 
implemented to run in 0((n log n -|- tlog^ n) log(l/^)). 
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Proof We first suppose that \stretchT{G)\ < 1 holds. Thus G/2 ^ T ^ G, by well known facts [BH03]. 
Therefore returning H = 2T satisfies the claims. Now assume that the condition is not true. Since 
in Step 6 the weight of each tree edge is increased by at most a factor of k, we have G < G' ^ kG. 
IncrementalSparsify sets = 1 if e G Et and stretchT{e) / k otherwise, and invokes Sample to 
compute a graph H such that with probability at least 1 — ^, we get 

G <G' <2H <?,G' ^ 3kG. (3.1) 

We now bound the number \C\ of off-tree samples drawn by Sample. For the number t used in Sample 
we have t = t + n — 1 and q = Cgt log t log(l/^) is the number samples drawn by Sample. Let Xi be a 
random variable which is 1 if the i^^ sample picked by Sample is a non-tree edge and otherwise. The 
total number of non-tree samples is the random variable X = X^^^i^i, and its expected value can be 
calculated using the fact Pr{Xi = 1) = t/t: 

E[X\ = 9- = t ^ = C5tlogtlog(l/0. 

Step 12 assures that H does not contain more than 2ii^[X] edges so the claim about the number of off-tree 
samples is automatically satisfied. A standard form of Chernoff 's inequality is: 

Pr[X > {l + 5)E[X]] < exp{-5^E[X]) 
Pr[X < (1 - 5)E[X]] < exp{-5'^E[X]). 

Letting (5 = 1, and since t > 1, C5 > 2 we get Pr[X > 2E[X]] < {exp{-2E[X]) < Ijr?. So, the probability 
that the algorithm returns a FAIL is at most 1/n^. It follows that the probability that an output of Sample 
satisfies inequality 3.1 and doesn't get rejected by IncrementalSparsify is at least 1 — — 1/n^. 

We now concentrate on the edges of T . Any fixed edge e G Et is sampled with probability 1/t in 
Sample. Let Xf. denote the random variable equal to number of times e is sampled. Since there are 
q = C st\ogt\og{l / ^) iterations of sampling, we have £^[Xe] = q/t > C^logn. By the Chernoff inequalities 
above, setting (5 = 1/2 we get that 

Pr[Xe > i3/2)E[Xe]] < exp{- (Cs/ 4) log n) 

and 

Pr[Xe < il/2)E[X,]] < exp{- {Cs/ 4.) log n). 

By setting Cs to be large enough we get exp(— (Cs/4) logn) < n~^. So with probability at least 1 — l/?i^ 
there is no edge e £ Et such that X^ > {3/2)E[Xe] or Xe < {l/2)E[Xe]. Therefore we get that with 
probability at least 1 — 1 /n? all the edges e G Et in H have weights at most three times larger than their 
weights in {H/2), and 

G<H<H< 18H ^ MkG. 

Overall, the probability that the output H of IncrementalSparsify satisfies the claim about the condi- 
tion number is at least 1 — ^ — 2/n^ > 1 — 2/^. 

We now consider the time complexity. We first compute the effective resistance of each non-tree edge by 
the tree. This can be done using Tarjan's off-line LCA algorithm [Tar 79], which takes 0{m) time [GT83]. 
We next call Sample, which draws a number of samples. Since the samples from Et don't affect the 
output of IncrementalSparsify we can implement Sample to exploit this; we split the interval [0, 1] 
to two non-overlapping intervals with length corresponding to the probability of picking an edge from Et 
and E — Et- We further split the second interval by assigning each edge in ii^ — Et with a sub- interval 



6 



of length corresponding to its probability, so that no two intervals overlap. At each sampling iteration 
we pick a random value in [0, 1] and in 0(1) time we decide if the value falls in the interval associated 
with E — Et- If no, we do nothing. If yes, we do a binary search taking O(logn) time in order to find 
the sub-interval that contains the value. With the given input Sample draws at most 0(i lognlog(l/^)) 
samples from E — Et and for each such sample it does O(logn) work. It also does 0(nlognlog(l/^)) work 
rejecting the samples from Et- Thus the cost of the call to Sample is 0((n log n + tlog^ n) log(l/^)). ■ 
Since the weights of the tree-edges Et in H are different than those in G, we will use Th to denote the 
spanning tree of H whose edge-set is Et- We now show a key property of IncrementalSparsify. 

Lemma 3.3 (Uniform Sample Stretch) Let H = (V, EtUC,w) := IncrementalSparsify(G, E't, ^, C)? 
and Cs,t as defined in Theorem 3.2- For all I £ C, we have 

1 

stretchTfj {I) 



3C5logtlog(l/0' 



Proof Let T' = nT. Consider an arbitrary non-tree edge e of G' defined in Step 5 of IncrementalSpar- 
sify. The probability of it being sampled is: 

where RT'{e) is the effective resistance of e in T' and t = n — 1 + st'{G') = n — 1 + stretchT{G) / k is the 
total stretch of all G' edges by T' . If e is picked, the corresponding sample I has weight We scaled up by a 
factor of 1/Pg, but then divided by q at the end. This gives 

We 1 Wp 1 



wi = — 



p'^ q iweRT'{e))/t Cstlogt\og{l/0 

1 



CsRT'{e)logtlog{l/0' 

So the stretch of I with respect to T' is independent from We and equal to 

stretchT'{e) = wiRT'{e) 



Cslogtlog{l/0' 

Finally note that Th = 3T'. This proves the claim. ■ 
4 Solving using Incremental Sparsifiers 

We follow the framework of the solvers in [ST06] and [KMPlOa] which consist of two phases. The pre- 
conditioning phase builds a chain of graphs C = {Gi, Hi,G2, - - - ,Hd} starting with Gi = G, along with 
a corresponding list of positive numbers /C = . . . ,Kd-i} where is an upper bound on the condi- 
tion number of the pair {Gi,Hi). The process for building C alternates between calls to a sparsification 
routine (in our case IncrementalSparsify) which constructs Hi from Gi and a routine GreedyElim- 
INATION which constructs Gi+i from Bi, by applying a greedy elimination of degree 1 and 2 nodes. The 
preconditioning phase is independent from the 6-side of the system Lax = b. The solve phase passes C, b 
and a number of iterations t (depending on a desired error e) to the recursive preconditioning algorithm 
R-P-Chebyshev, described in [ST06] or in the appendix of [KMPlOa]. 

We first give pseudocode for GreedyElimination, which deviates slightly from the standard presen- 
tation where the input and output are the two graphs G and G, to include a spanning tree of the graphs. 
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GreedyElimination 

Input: Graph G = {V, E, w), Spanning tree T of G 
Output: Graph G = {V, E, w), Spanning tree T of G 



1: G :=G 

2: Ef := Et 

3: repeat 

4: greedily remove all degree- 1 nodes from G 

5: if degQ{v) = 2 and {v,ui), {v,U2) S E^ then 

6: w' := (l/w^uijv) + l/w{u2,v))~^ 

7: w" := w{ui,U2) {* it may be the case that w" = *) 

8: replace the path {ui,v,U2) by an edge e of weight w' in G 

9: if {ui,v) or {v,U2) are not in T then 

10: Let f = {f} - {{ui,v), {V,U2), {ui,U2)} 

11: else 

12: Let T = {T U e} - {{ui,v), {v, U2), (ui, 1*2)} 

13: end if 

14: end if 

15: until there are no nodes of degree 1 or 2 in G 

16: return G 



Of course we still need to prove that the output T is indeed a spanning tree. We prove the claim in the 
following Lemma that also examines the effect of GreedyElimination to the total stretch of the off-tree 
edges. 

Lemma 4.1 Let {G,f) := GreedyElimination (G, T). The output f is a spanning tree of G, and 

\stretchf{G)\ < \stretchT{G)\. 

Proof We prove the claim inductively by showing that it holds for all the pairs {Gi,Ti) throughout the 
loop, where {Gi,Ti) denotes the pair {G,T) after the i*'* elimination during the course of the algorithm. 
The base of the induction is the input pair (G, T) and so the claim holds for it. 

When a degree-1 node gets eliminated the corresponding edge is necessarily in by the inductive 
hypothesis. Its elimination doesn't affect the stretch of any off-tree edge. So, it is clear that if (Gj,Tj) 
satisfy the claim then after the elimination of a degree-1 node (Gi+i,ri+i) will also satisfy the claim. 

By the inductive hypothesis about Tj if {v, ui), {v, U2) are eliminated then at least one of the two edges 
must be in Tj. We first consider the case where one of the two (say {v,U2)) is not in Tj. Both ui and U2 
must be connected to the rest of Gj through edges of Tj different than {ui,v) and {v,U2)- Hence Tj+i is a 
spanning tree of Gj+i. Observe that we eliminate at most two non-tree edges from Gf. {v,U2) and {ui,U2) 
with corresponding weights w{v,U2) and w" respectively. Let T[e] denote the unique tree-path between 
the endpoints of e in T. The contribution of the two eliminated edges to the total stretch is equal to 

51 = w{v,U2)Rf^{{v,U2)) +w"Rf,{{ui,U2)). 

The two eliminated edges get replaced by the edge {ui,U2) with weight w' + w" . The contribution of the 
new edge to the total stretch in Gi+i is equal to 

52 = w'Rf^^_^{{ui,U2)) + w"Rf^^^{{ui,U2)). 



8 



We have i?^^^ ((ui, U2)) = Rf {{ui,U2)) < RfXiv,U2)) since all the edges in the tree-path of (tii,U2) are 
not affected by the elimination. We also have w{v,U2) > w' , hence si > S2- The claim follows from the 
fact that no other edges are affected by the elimination, so 

\stretchf {Gi)\ — \stretchf ^^{Gi+i)\ = si — S2 > 0. 

We now consider the case where both edges eliminated in Steps 5-13 are in Tj. It is clear that Tj+i is a 
spanning tree of Gj+i. Consider any off-tree edge e not in Tj+i. One of its two endpoints must be different 
than either ui or U2, so its endpoints and weight We are the same in Tj. However the elimination of v may 
affect the stretch of e if Ti[e] goes through v. Let 

r = ( ^ 1/we') - il/w{ui,v) + l/w{u2,v)) 

e'efi[e] 

= ( ^ 1/We') - [{l/w{ui,v) + 1/W{U2,V))~^ +We^ 
e'efi+i[e] 

We have 

stretchf, {e) u;e Ee'efi[e] ^/^e' {l/w{ui,v) + l/w{u2,v)) + t 



stretchrf. (e) WeY^^ ,,t^ r , l/if;^' (1-^11 \ . / / w-i , \ ^ , 

Ti+l"^ > e^e'6Ti+i[e] / e {{l/w{ui,v) + l/w{u2,v)) + We) +T 



> 1. 



Since individual edge stretches only decrease, the total stretch also decreases and the claim follows. ■ 
A preconditioning chain of graphs must certain properties in order to be useful with R-P-Chebyshev. 

Definition 4.2 [Good Preconditioning Chain] 

Let C = {G = Gi, Hi,G2, • • • , Gd} be a chain of graphs and fC = K2, . . . , Kd~i} 0, list of numbers. We 
say that {C,IC} is a good preconditioning chain for G, if there exist a list of numbers U = {/ii,/i2, • • • /^d} 
such that: 

Gi Hi ^ KiGi- 

2. Gj+i = GREEDYELIMINATION(//j). 

3. /ij is at least the number of edges in Gi . 

4- l^i^ 1^2 ^ fn, where m is the number of edges in G = Gi. 

5. ^j/^j+i > [cr-y/Ki] for all i > 1 where Cr is an explicitly known constant. 

6. Ki > Kj+i. 

7. fid is a smaller than a fixed constant. 

Spielman and Teng [ST06] analyzed the recursive preconditioned Chebyshev iteration R-P-Chebyshev 
that can be found in the appendix of [KMPlOa] and showed that the solution of an arbitrary SDD system 
can be reduced to the computation of a good preconditioning chain. This is captured more concretely by 
the following Lemma which is adapted from Theorem 5.5 in [ST06]. 

Lemma 4.3 Let A be an SDD matrix with A = Lq + D where D is a diagonal matrix with non-negative 
elements, and Lq is the Laplacian of a graph G. Given a good preconditioning chain {C,IC} forG, a vector 
X such that \ \x — A'^h\\A < e||^"'"6m can be computed in time 0{m^fKi + m^KiK2) log(l/e)). 
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Before we proceed to the algorithm for building the chain we will need a modified version of a result 
by Abraham, Bartal, and Neiman [ABN08], which we prove in Section 5. 

Theorem 4.4 There is an algorithm LowStretchTree that, given a graph G = {V,E,w), outputs a 
spanning tree T of G such that 

stretchTje) < 0(m log n log log^ n). 

The algorithm runs in 0{m log n + n log n log log n) time. 
Algorithm BuildChain generates the chain of graphs. 



BuildChain 

Input: Graph G, scalar p with < p < 1 

Output: Chain of graphs C = {G = Gi, i^i, G2, . . . , G^}, List of numbers K,. 

1: (* Cstop and Kc are explicitly known constants *) 

2: Gi := G 

3: T := LowStretchTree(G) 

4: Hi := Gi + d{log^ n)T 

5: G2 := Hi 

6: /C := 0; C := 0; i := 2 

7: ^ := 21ogn 

8: 

9: {*ni denotes the number of nodes in Gi*) 

10: while Ui > Cstop do 

11: Hi = {Vi,ETi U Ci) := lNCREMENTALSPARSIFY(Gi, £^T,, «^c,pO 

12: {Gj+i,Tj+i} := GREEDYELIMINATION(ffj,Tj) 

13: C=CU{Gi,Hi} 

14: i ■=i + l 

15: end while 

16: /C = {0(l0g^ n), Kc, Kc - ■ ■ , Kc} 

17: return {C,/C} 



It remains to show that our algorithm indeed generates a good preconditioning chain. 

Lemma 4.5 Given a graph G, BuildChain(G,p) produces with probability at least 1 — p, a good precon- 
ditioning chain {C,IC} for G, such that ki = O(log^n) and for all i > 2, Ki = Kc for some constant Kc- 
The algorithm runs in time proportional to the running time 0/ LowStretchTree(G). 

Proof Let li denote the number of edges in G and 4 = the number of off-tree samples for i > 1. We 
prove by induction on i that: 

(a) k+i < 2li/Kc. 

(b) stretchTi+^iGi+i) < k /{Cs log tilog{l/{pS,))) = Kcti, where Cs,ti and ti are as defined in Theorem 
3.2 for the graph Gj. 

For the base case of i = 1, by picking a sufficiently large scaling factor ki = (5(log^ n) in Step 4, we can 
satisfy claim (b). By Theorem 3.2 it follows that I2 < 21i/kc, hence (a) holds. For the inductive argument, 
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Lemma 3.3 shows that stretchEj-XHi) is at most /j/(C5 logtj log(l/(p,^))). Then claim (b) follows from 
Lemma 4.1 and claim (a) from Theorem 3.2. 

We now exhibit the list of numbers U = {^i,/i2 ■ ■ ■ fJ-d} required by Definition 4.2. A key property 
of GreedyElimination is that if G is a graph with n - 1 + j edges, the output G of GreedyElim- 
ination(G) has at most 2j — 2 vertices and 3j — 3 edges [ST06]. Hence the graph Gj+i returned by 
GREEDYELiMiNATiON(//j) has at most 6/j/Kc edges. Therefore setting /Uj = G/j/kc gives an upper bound 
on the number of edges in Gj+i and: 

At the same time we have Gi ^ Hi ^ fyAK(.Gi. By picking to be large enough we can satisfy all the 
requirements for the preconditioning chain. 

The probability that Hi has the above properties is by construction at least 1 — p/(21ogn). Since there 
are at most 2 log n levels in the chain, the probability that the requirements hold for all i is then at least 

(l-p/(21ogn))2i°g" > 1-p. 

Finally note that each call to IncrementalSparsify takes 0(/ii log n log(l/p)) time. Since //i decreases 
geometrically with i, the claim about the running time follows. ■ 
Combining Lemmas 4.3 and 4.5 proves our main Theorem. 

Theorem 4.6 On input annxn symmetric diagonally dominant matrix A with m non-zero entries and a 
vector b, a vector x satisfying ||x— ^"''^m < e||^"'"6m can be computed in expected time 0(m log n log (1/e)). 

5 Speeding Up Low Stretch Spanning Tree Construction 

We improve the running time of the algorithm for finding a low stretch spanning tree given in [EEST05, 
ABN08] by a factor of logn, while retaining the O(mlognloglog'^n) bound on total stretch given in 
[ABN08]. Specifically, we claim the following Theorem. 

Theorem 5.1 There is an algorithm LowStretchTree that given a graph G = {V,E,w), outputs a 
spanning tree T of G in 0(m log n + nlognloglogn) time such that 

stretchTje) < 0(m log n log log^ n). 

eG-B 

We first show that if the graph only has k distinct edge weights, Dijkstra's algorithm can be modified to 
run in 0{m + nlogk) time. Our approach is identical to the algorithm described in [OMSWIO]. However, 
we obtain a slight improvement in running time over the 0(m log ^) bound given in [OMSWIO]. 

The low stretch spanning tree algorithm in [EEST05, ABN08] makes use of Dijkstra's, as well as 
intermediate stages of it in the routines BallCut and ConeCut. We first improve the underlying data 
structure used by these routines. 

Lemma 5.2 There is a data structure that given a list of non-negative values L = {li . . . 1^} (the distinct 
edge lengths), maintains a set of keys (distances) starting with {0} under the following operations: 

1. FindMin().- returns the element with minimum key. 

2. DeleteMin(); delete the element with minimum key. 

3. Insert(j).- insert the minimum key plus Ij into the set of keys. 
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4- DECREASEKEY(t', j); decrease the key of v to the minimum key plus Ij. 
Insert and DecreaseKey have 0(1) amortized cost and DeleteMin has 0(\ogk) amortized cost. 



Proof We maintain k queues Q\ . . . Qt containing the keys with the invariant that the keys stored in 
them are in non-decreasing order. We also maintain a Fibonacci heap as described in [FT87] containing 
the first element of all non-empty queues. Since the number of elements in this heap is at most fc, we 
can perform INSERT and DecreaseKey in 0(1) and DeleteMin in 0(logA;) amortized time on these 
elements. The invariant then allows us to support FindMin in 0(1) time. 

Since 1^ > 0, the new key introduced by Insert or DecreaseKey is always at least the minimum 
key. Therefore the minimum key is non-decreasing throughout the operations. So if we only append 
keys generated by adding Ij to the minimum key to the end of Qj, the invariant that the queues are 
monotonically non-decreasing is maintained. Specifically, Insert(j) can be performed by appending a new 
entry to the tail of Qj. 

For DECREASEKEY(t;, j), suppose v is currently stored in queue Qi. We consider two cases: 

1. V has a predecessor in Qi. Then the key of v is not the key of Qi in the Fibonacci heap and we can 
remove v from Qi in 0(1) time while keeping the invariant. Then we can insert v with its new key 
at the end of Qj using one Insert operation. 

2. V \s currently at the head of Qi. Then simply decreasing the key of v would not violate the invariant 
of all keys in the queues being monotonic. As the new key will be present in the heap containing the 
first elements of the queues, a decrease key needs to be performed on the Fibonacci heap containing 
those elements. 

DeleteMin can be done by doing a delete min in the Fibonacci heap, and removing the element from 
the queue containing it. If the queue is still not empty, it can be reinserted into the Fibonacci heap with 
key equaling to that of its new first element. The amortized cost of this is 0(log k) + 0(1) = 0(log k). ■ 

The running times of Dijkstra's algorithm, BallCut and ConeCut then follows. 

Corollary 5.3 Let G he a connected weighted graph and xq he some vertex. If there are k distinct values 
of d{u,v), Dijkstra's algorithm can compute d{xo,u) for all vertices u in 0{m + n\ogk) time. 

Proof Same as the proof of Dijkstra's algorithm with Fibonacci heap, except the cost of a DeleteMin 



Corollary 5.4 (Corollary 4-3 of [EEST05]) If there are at most k distinct distances in the graph, then 
BallCut returns hall Xq such that 



in 0{vol{Xo) + \V{Xo)\ log k) time. 

Corollary 5.5 (Lemma 4-^ of [EEST05]) If there are at most k distinct values in the cone distance p, 



is 0(log k). 




then 



For any two values < r^m < r'„ 



max ' 



ConeCut finds a real r G [rmimf-. 



) such that 



cost{6{Bp{r,xo))) < 



V0l{Lr) + T 




- r, 
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in 0{vol{Bp{r, xq)) + \V{Bp{r,xo))\ log A;) time, where Bp{r,xo) is the set of all vertices v within distance 
r from xq in cone length p. 

Proof The existence such a follows from Lemma 4.2 of [EEST05] and the running time follows from 
the bounds given in Lemma 5.2. ■ 
We now proceed to show a faster algorithm for constructing low stretch spanning trees by using the 
data structure from Lemma 5.2. Our presentation is based on the algorithm described in [ABN08], which 
consists of HierarchicalStarPartition at the top level that makes repeated calls to StarPartition. 
StarPartition then in turn obtains a desired partition via. caUs to BallCut and ImpConeDecomp 
which uses ConeCut. Due to space limitations we refer to these routines without stating their parameters 
and guarantees. 

Lemma 5.6 Given a graph X that has k distinct edge lengths, The version of StarPartition that uses 
ImpConeDecomp as stated in Corollary 6 of [ABN08] runs in time 0{vol{\X\) + \V{X)\ log/c). 

Proof Finding radius and calling BallCut takes 0{vol{\X\) + log/c) time. Since the XiS form 

a partition of the vertices and ImpConeDecomp never reduce the size of a cone, the total cost of all calls 
to ImpConeDecomp is 

Y,{vol{Xi) + \V{Xi)\ log A:) < vol{X) + \V{X)\ log k. 



We now need to ensure that all calls to StarPartition are made with a small value of k. This can be 
done by rounding the edge lengths so that at any iteration of HierarchicalStarPartition, the graph 
has O(logn) distinct edge weights. 



RoundLengths 

Input: Graph G = {V, E, d) 

Output: Rounded graph G = {V, E, d) 

1: Sort the edge weights of d so that 

d{ei) < d{e2) < ■ ■ ■ < d{em)- 
2: i' = 1 

3: for i = 1 . . . m do 

4: if d{ei) > 2d{ei') then 

5: i' = i 

6: end if 

7: d{ei) = d{ei>) 

8: end for 

9: return G = {V, E, d) 



The cost of RoundLengths is dominated by the sorting the edges lengths, which takes O(mlogm) 
time. Before we examine the cost of constructing low stretch spanning tree on G, we show that for any 
tree produced in the rounded graph G, taking the same set of edges in G gives a tree with similar average 
stretch. 

Claim 5.7 For each edge e, ^d{e) < d{e) < d{e). 
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Lemma 5.8 Let T he any spanning tree of {V^E), and u,v any pair of vertices, we have 

-dT{u,v) < dT{u,v) < dT{u,v). 

Proof Summing the bound on a single edge over all edges on the tree path suffices. I 
Combining these two gives the following Corollary. 

Corollary 5.9 For any pair of vertices u, v such that uv G E, 

1 dT{u,v) ^ drju^v) ^ ^ dT{u,v) 

2 d{u,v) ~ d{u,v) ~ d{u,v) 

Hence caUing HierarchicalStarPartition(G', xq, Q) and taking the same tree in G gives a low 
stretch spanning tree for G with O(mlognloglog^n) total stretch. It remains to bound the running time. 

Theorem 5.10 HierarchicalStarPartition(G', xq, Q) runs in 0(m log m + n log m log log m) time on 
the rounded graph G. 

Proof It was shown in [EEST05] that the lengths of all edges considered at some point where the 
farthest point from xq is r is between r • and r. The rounding algorithm ensures that if d{ei) ^ 
d{ej) for some i < j, we have 2d{ei) < d{ej). Therefore in the range [r,r • n^] (for some value of r), 
there can only be O(logn) different edge lengths in d. Lemma 5.6 then gives that each call of star- 
partition runs in 0{vol{X) + log log n) time. Combining with the fact that each edge appears 
in at most O(logn) layers of the recursion (Theorem 5.2 of [EEST05]), we get a total running time of 
0(mlogn + nlognloglogn). ■ 

6 Discussion 

The output of IncrementalSparsify is a graph of samples with a remarkable property as a direct 
consequence of Lemma 3.3; its further incremental sparsification can be performed by a mere uniform 
sampling of its off-tree multi-edges. 

This leads naturally to the definition of a smooth sequence of (multi)-graphs on a common set of 
vertices, with the following properties: (i) it is of logarithmic size, (ii) the first graph is spine-heavy, (iii) 
every two subsequent graphs have a constant condition number, and (iv) the last graph is a tree. The 
sequence can be obtained by applying one round of IncrementalSparsify to the spine-heavy graph, and 
then O(logn) rounds of uniform sampling. 

Smooth sequences of graphs can be useful in an alternative way for building a chain of preconditioners, 
which separates sparsification from greedy elimination. More concretely, the alternative algorithm first 
builds a smooth sequence of graphs, starting from the spine-heavy version of the input graph. Then, 
somewhat roughly speaking, the final chain is obtained by applying a slightly less aggressive version of 
GreedyElimination to each graph in the sequence; this version eliminates degree-one nodes as usually, 
but restricts itself to degree-two nodes whose both adjacent edges are in the low-stretch tree. The simplicity 
of this approach is particularly highlighted in the case of low-diameter unweighted graphs. Solving such 
graphs has now been essentially reduced to the computation of a BFS tree followed by a number of rounds 
of uniform sampling. 

We believe that smooth sequences of graphs is a notion of independent interest that may found other 
applications. 
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